Improving the Speed of LZ77 Compression by Hashing and Suffix Sorting

نویسندگان

  • Kunihiko SADAKANE
  • Hiroshi IMAI
چکیده

Two new algorithms for improving the speed of the LZ77 compression are proposed. One is based on a new hashing algorithm named two-level hashing that enables fast longest match searching from a sliding dictionary, and the other uses suffix sorting. The former is suitable for small dictionaries and it significantly improves the speed of gzip, which uses a naive hashing algorithm. The latter is suitable for large dictionaries which improve compression ratio for large files. We also experiment on the compression ratio and the speed of block sorting compression, which uses suffix sorting in its compression algorithm. The results show that the LZ77 using the two-level hash is suitable for small dictionaries, the LZ77 using suffix sorting is good for large dictionaries when fast decompression speed and efficient use of memory are necessary, and block sorting is good for large dictionaries. key words: LZ77, hash, gzip, suffix sorting, block sorting

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Suitability of Suffix Arrays for Lempel-Ziv Data Compression

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used nowadays. Regarding time and memory requirements, LZ encoding is much more demanding than decoding. In order to speed up the encoding process, efficient data structures, like suffix trees, have been used. In this paper, we explore the use of suffix arrays to hold the dictionary of the LZ encoder, and propose an algori...

متن کامل

Suffix Arrays - A Competitive Choice for Fast Lempel-Ziv Compressions

Lossless compression algorithms of the Lempel-Ziv (LZ) family are widely used in a variety of applications. The LZ encoder and decoder exhibit a high asymmetry, regarding time and memory requirements, with the former being much more demanding. Several techniques have been used to speed up the encoding process; among them is the use of suffix trees. In this paper, we explore the use of a simple ...

متن کامل

Time and Memory Efficient Lempel-Ziv Compression Using Suffix Arrays

The well-known dictionary-based algorithms of the Lempel-Ziv (LZ) 77 family are the basis of several universal lossless compression techniques. These algorithms are asymmetric regarding encoding/decoding time and memory requirements, with the former being much more demanding, since it involves repeated pattern searching. In the past years, considerable attention has been devoted to the problem ...

متن کامل

A Modified Burrows-Wheeler Transformation for Case-Insensitive Search with Application to Suffix Array Compression

Now the Block sorting compression [l] becomes common by its good balance of compression ratio and speed. It has another nice feature, which is the relation between encoding/decoding process and suffix array. The suffix array [2] is a memory-efficient data structure for searching any substring of a text. It is an array of lexicographically sorted pointers to suffixes of a text. It is also used f...

متن کامل

Improving Text Indexes Using Compressed Permutations

Any sorting algorithm in the comparison model defines an encoding scheme for permutations. As adaptive sorting algorithms perform o(n lg n) comparisons on restricted classes of permutations, each defines one or more compression schemes for permutations. In the case of the compression schemes inspired by Adaptive Merge Sort, a small amount of additional data allows to support in good time the ac...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000